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I . INTRODUCTION 



The last decade has seen the general acceptance of the importance 
of syntax description and syntactical analysis in the development of, 
and support for, new automatic programming languages for digital com- 
puters . 

Since the introduction of the stored program computer, communication 
between man and his computer has been a problem. The laborious and time 
consuming numeric coding of machine language inhibited many potential 
computer users from learning how to communicate directly with the com- 
puter. The man with a computer solvable problem often found manual 
calculations easier than trying to communicate with either the expert 
programmer or the machine. A means of communication with the computer, 
more easily learned than machine language, was needed. This led to the 
development of problem-oriented languages. 

Problem-oriented languages like ALGOL, PL/l, and FORTRAN, were 
designed to facilitate communication between user and computer for 
solutions to mathematical, and other special interest problems. 

These automatic programming languages generally consisted of a 
vocabulary which incorporated many key words from a profession or an 
interest area. The user could instruct the computer in a language 
similar to ordinary English usage. For example, instead of trying 
to manipulate an array by laboriously coding a sequence of instructions 
to achieve the transpose of a matrix, the user could achieve the same 
goal by the use of a special reserved word, such as "TRANSPOSE”. It 
should be noted, however, that these special reserved words, such as 
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"TRANSPOSE", have a specific, or nonredundant meaning. The programmer 
must use these words in accordance with the coding restrictions of the 
particular language in which he is programming. 

The advent of the automatic programming languages attracted more 
users to the computer. These new users, as they discovered more 
applications for the computer, requested more languages specifically 
designed for their particular interests. Computer specialists 
responded by developing more new languages. 

The introduction of time sharing systems, with many terminals, 
magnified the problem of satisfying many users and their programming 
language requirements. Whereas previously only the large corporations 
could afford a computer installation, now many small businesses were 
able to share a central installation. The over-all effect provided 
a large body of users, solving computer problems with a wide variety 
of special purpose languages. 

The cost of these time sharing systems is distributed among the 
users. Each user is charged for the amount of computer processor 
time used by his program. Obviously, the user is interested in 
minimizing his processor time, thus decreasing costs and increasing 
profits. However, many of the users of time sharing systems are not 
completely familiar with the restrictions of the particular language 
in use. As a result, they use the computer to "trouble shoot" their 
programs for syntax errors. An experienced programmer finds that 
programs seldom compile correctly on the first attempt. Yet com- 
pilation of entire programs is attempted on each occasion with the 
associated consumption of expensive processor time. The need is 
apparent for a tool that will assist the programmer in eliminating 
syntax errors, while minimizing expensive processor time. 
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A by-product of reducing the number of attempts to compile is 
the additional processor time available. This time can be used to 
either improve response time to the current users, or to provide 
service to new users. 

Thus the objective of this thesis: to provide a basic universal 

syntax checking program which could be expanded for use in a time- 
sharing environment. This program is hereafter referred to as the 
Universal Syntax Checker. This syntax checking program accepts the 
description of all languages and provides a syntax check of programs 
written in the described languages, as shown in Figure 1-1. The 
syntax checker requires less facility resources than a language 
processor, generates no code, and is intended for use in a time- 
sharing environment where the various users may time-share the 
universal syntax checker regardless of the language in use at each 
terminal . 
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Figure 1-1 Universal Syntax Checker 
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II. DEFINITIONS 



A. SYNTAX AND SEMANTICS 

In order to understand the operation of the universal syntax checker, 
a knowledge of the methods used to formulate and describe languages is 
necessary. 

There is a lack of standard uniform notation throughout the litera- 
ture, therefore the definitions and notation from [1] and [2] are 
selected for use in this discussion. 

The problem of describing a language involves a unique difficulty. 
During the discussion, a distinction must be made between the language 
being described and the describing language. If the language being 
described is called the '^language”, then the language in terms of 
which the description is being made is called the '*meta language'* . 

If these two languages are not distinguishable, much ambiguity and 
imprecision results. A discussion of how languages are described is 
in order. 

In the exposition of automatic programming languages, the 
descriptions themselves can be classified into two types; syntactic 
definitions and semantic discussions. The syntactic definitions make 
explicit what structures are to be meaningful in the language. These 
definitions are concerned with the proper construction of words, 
expressions, and statements. The semantic discussions are concerned 
with the meanings to be attributed to the various structures and their 
proper usage in the language . 
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It often happens that in the process of describing a new computer 
language, the formats for statements are presented in terms of the natural 
language. These formats define which components are necessary and how 
they are to be put together to form meaningful statements. These are 
the syntax and semantics of the language. 

The syntax of a FORTRAN DO statement can be given as the natural 
language definition : 

DO n i = m^, m^ , , where, 

n is a statement identifier, 
i is a simple integer, 

and m^, m^ , m^ are simple integer variables. 

The Syntax definition, however, is not sufficient to specify a 
properly formed and meaningful DO statement. It is also necessary 
to discuss the semantics of the DO statement. The semantics could 
be given as follows : 



V(m^) > V(m^) > 0 and V(m^) > 0 




It is also necessary that n refer to a statement not previously defined 
in the subprogram [1]. 

In general, the syntax and semantics of an entire language, (i.e., 
of each statement in the language), are given in order to specify the 
proper construction of meaningful statements in that language. 

To formalize the definitions in the metalanguage, each definition 
is given the form of a statement or construct, sometimes called a 
production [3]. Thus the syntactic definition of an "unsigned integer" 
could be : 
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An unsigned integer can be formed by writing 

a digit or 

an unsigned integer followed by a digit. 

This definition is also an example of a recursive definition. That 
is, the term "unsigned integer" is defined in terms of itself. 

Syntactic definitions can frequently be shortened by using a 
metalanguage symbolism. The following symbols will be used: 

SYMBOL INTERPRETATION 

< X > angular brackets; a syntactic category, the 

object named x. 

; "can be written as" or, "is defined as". 

I reads as "or". 

To repeat the recursive definition of an unsigned integer using 
the above notation: 

< unsigned integer > : := < digit > | 

< unsigned integer > < digit > 

This method of syntactic definition of a language is called "specifica- 
tion by Backus -norma 1 form," and abbreviated B.N.F.^, 

The formalism for the semantic discussion of a particular language 
is not readily available or apparent. Although attempts have been 
made to make this formalization [1], it is not of concern here since 
the universal syntax checker verifies the syntax, not the semantics 
of a particular language. 



Actually, the form is not normal, B.N.F. is often called Backus-Naur 
form. Naur introduced the actual notation, and Backus introduced the 
concept [2 ] . 



13 



B. GRAMMAR AND LANGUAGE 



As mentioned earlier, the universal syntax checker accepts the 
definition of a language and programs written in the language. 

The second input to the checker, (a program written in the 
described language), necessitates a discussion of grammar. 

Mention of the word ^'grammar" brings to mind many thoughts. Most 

people relate the word to something learned in school, or think of a 

set of words and rules which describe some language. Webster defines 

grammar in the following manner: 

The science treating of the classes of words, their 
inflections, and their syntactical relations. 

Most of us recall disecting and diagramming sentences to learn the 

syntactical relation of the parts of a sentence. This is known as 

parsing. For example, the sentence '^The little girl talks fast.*’, 

may be disected after a syntactic definition of a small subset of 

English . 



1 


< 


sentence > 




< noun phrase > 


< 


verb phrase 


> 


2 


< 


noun phrase > 


: : = 


< adjective > 


< 


noun phrase 


> 1 


3 








< adjective > 


< 


singular noun > 


4 


< 


verb phrase > 


: : = 


< singular verb > 


< 


adverb > 




5 


< 


adjective > 


: ; = 


the 1 








6 








little 








7 


< 


singular noun > 


: : = 


girl 








8 


< 


singular verb > 


: : = 


ta Iks 








9 


< 


adverb > 


: : = 


fast 









Figure 2-1. Constructs of a sentence 
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Through the application of the syntax rules the sentence is parsed. 
The results, when diagrammed, yield a syntax tree. Figure 2-2 shows 
the diagram of this particular syntax tree. 

< sentence ^ 




< verb phrase > 



/ 

/ 



adjective > < singular verb > < adverb ^ 

< adjective > < singular noun > ! 

I I ■ 

The little girl talks fast 

Figure 2-2. Diagram of the sentence 
"The little girl talks fast." 



The application of the rules in Figure 2-1 to the sentence "The 
little girl talks fast." reveals that the sentence is grammatically 
correct. One should observe that sentences cannot only be tested 
but may also be generated by the application of the same rules. Thus 
application of rule 1 followed by as many applications of rule 2 as 
desired, and so on until no further application of rules is possible, 
would also yield a grammatically correct sentence. The sentence might 
not mean anything, but it would be grammatically correct. The semantics 
of a language prevent improper statement formulation. For example, 
the application of rule 1, followed by the application of rule 2 twice, 
would yield the sentence, "The the little girl talks fast," which is 
grammatically correct, but nonsensical just the same. 

To formalize the grammar presented, four ingredients were necessary. 
First, the ingredient of syntactic categories, the noun phrase, adjective, 
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etc., from wtiicii strings of words were derived. These syntactic 
categories are called non-terminal symbols. Second, there were the 
words from which the sentences could be constructed. These objects 
are called terminal symbols. Third, the relations between the various 
strings of terminal and non-terminal symbols which are called pro- 
ductions. These productions are the rules of syntax, as shown in 
Figure 2-1. And last, there was one distinguished non-terminal 
symbol, which appeared nowhere on the right of some production rule. 
This distinguished symbol is referred to as the start symbol [4], or 
the goal symbol [2]. The non-terminal symbol "sentence" was the goal 
symbol in the example given. 

Thus a grammar G may be defined as , P, Z) . The symbols 

V^, , P, and Z represent in order, the set of non-terminal symbols, 

the set of terminal symbols, a set of productions, and the goal symbol 

Let V*, V*, V* be finite concatenations of symbols from V , V , 
n’t n’t 

and V union V respectively. These concatenations are called strings 
n t 

•k 

Let : represent the application of a finite sequence of pro- 

G 

ductions from G. 8 V* is said to be a "derivation" of a f: V if 

' t n 

ic 

and only if there exists a sequence of productions such that a : := 3 . 

Let G be the subset of English defined earlier. Letting a = 

< sentence >, 3 = "The little girl talks fast.", the parsing tree 

•k 

of Figure 2-2 shows that < sentence > : "The little girls talks 

G 

fast . " 

A language L(G) is defined as (w|w is in V* and Z : w) [4]. 

t G 

That is, a string is in L(G) if and only if; 
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1. Tlie string consists of a finite string of terminal symbols, 
a nd 

2. the string can be derived from Z by the application of a 
finite sequence of productions from the grammar G. 

The set of finite strings of terminal symbols defined by a grammar G 
is called the set of final sentential forms. 

Now the additional input to the universal syntax checker may be 
firmly identified. The statement, ’’and a program written in the 
described language,” means that the syntax checker is provided a 
string (program) which is a member of V*. 

Thus to summarize, the universal syntax checker receives the 
following two inputs : 

1. A description of a language L in B,N.F. (The productions, 
non-terminal symbols, and terminal symbols.) 

2. A finite string of terminal symbols from the described 
language. (An element of V*.) 

A formal description of the universal syntax checker is now 
possible . 

Given the B.N.F. of a language L, and a finite string of terminal 
symbols from the described language, V^, the universal syntax checker 
will determine whether that string is, or is not, a member of L(G) for 
the language L defined. 

C. RELATED RESEARCH 

The use of ’’syntax-directed” techniques is not a new one. This 
technique has been used in constructing natural language translators 
[5], compilers [6,7], automatic programming language translators [8, 

9, 10], and context-free grammar recognizers [11]. 
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Syntax directed analysis of natural languages is an unsolved problem 
due primarily to the inability to precisely define the syntax of each 
language. An excellent survey of the techniques employed can be found 
in Bobrow [5] . 

Griffiths and Patrick, [12] describe many types of recognition 
procedures, all syntax-directed, for context-free grammars. They 
employ Turing Machine algorithms to highlight the various methods. 

Turing machines were used to preserve clarity, conciseness, and to 
allow comparisons of procedures on the same level of complexity. The 
universal syntax checker more closely approaches the selective top to 
bottom (STB) method than any other described. 

Irons [6], develops a syntax-directed ALGOL 60 compiler. The 
arrangement of various tables to contain the specifications of the 
language are very similar to those employed in the universal syntax 
checker. Each specifies entries for all syntactic units, as integers, 
in the described language with a one-to-one correspondence between the 
actual symbol vector and the integer representation of that description. 
The two methods do differ in that Irons uses a bot tom-to-top method. 
Thus, his need for a precedence matrix to ensure that the longest 
string possible is accepted. The syntax checker requires reordering 
of the B.N.F, to present the longest string possible before any 
shorter string. 

Feldman and Gries discuss the pushdown stack method [3], as one 
of the two ways to achieve a parsing process. This same method is 
an integral part of the "cellar principle" used to design a syntax 
controlled generator of formal language processors [13]. The cellar 
principle is based on sequential processing of the input symbols. 
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The syntax checker also er^ploys a sequential processing of symbols and 
the recursive procedures available in PL/ 1 incorporate the pushdown 
stack implicitly for all variables. 

Barnett and Futrelle presents an account of the SHADOW language 
that is used to describe a syntax of a language and an associated 
subroutine which parses an input string [14]. The method requires 
that a mnemonic argument, in addition to the input string, be provided 
to tile SHADOW subroutine. These arguments include the names of arrays 
which contain the string and the syntax. Thus, if a parse of a rational 
fraction is desired the mnemonic RATFRN is a required input argument. 

The appearance of this mnemonic, requesting a syntactic analysis, 
causes the subroutine to use the most recently read requested pattern 
name and input string. This system is obviously unsuited for the time- 
sharing environment due to the requirement that the programmer be 
familiar with the syntax of the language in use. 

Unger presents a Global Parser (GP) , for phrase structured 
grammars [11]. The method he employs is also very close to the method 
used in the universal syntax checker. Some major differences do exist. 
The primary difference is his use of a set of routines for determining 
possible prefixes and suffixes of N-derivable strings, finding the 
minimum length of such strings, and sub-strings that can never appear 
in N-derivable strings. Another difference is the method of comparison 
between the production and input string. Unger claims that by matching 
all the terminal symbols of the intermediate string against the input 
string and constantly partitioning the input string, a broad class of 
checks can be made to terminate fruitless paths quickly. This parser 
will not handle cyclic definitions. 
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An error correcting parse algorithm is described by Irons utilizing 
a syntax-directed scheme [15]. The algorithm provides two services. 
First, the algorithm provides a parse of strings written in the language 
described. Second, if an incorrect string is presented, the algorithm 
will make substitutions, insertions, and deletions to make the object 
string syntactically correct. The approach is different from the 
syntax cliecker in that all possible parses are carried along until 
one can be determined to be correct. Backtracking and recursive pro- 
ductions are not allowed. Recursive productions are replaced by a 
similar powerful definition, which allows iterations in pairs of 
symbo Is . 

Most authors agree that there are several advantages and dis- 
advantages to syntax-directed procedures . Among the advantages are 
the simplicity of compiler construction and the ability to change the 
specifications of a particular language. In addition languages may be 
switched by merely changing the contents of the syntax table. Further, 
the syntax-directed parser can take into account as large a context as 
is required to perform the parsing. The disadvantages include the 
difficulty in trying to specify the syntax of a language, the fact 
that syntax-directed compilers contribute little toward the generation 
of optimum code, and generally poor error analysis and recovery. Error 
type determination is nearly impossible. 

D. PARSING METHOD 

It has been shown that taking a string of symbols and a grammar, 
and constructing a derivation of the string to form its syntax tree, 
is called parsing, recognizing, or analyzing. 
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There are two basic types of parsing methods: top-down and bottom-up. 
The bottom-up method will not be covered but is explained in [3]. The 
top-down parser gets its name from the fact that it is goal oriented. 

The top-down parser starts with the most global production (goal symbol) 
and works its way down the productions, attempting to match the input 
string. Each method can be further qualified as left-right or right- 
left, depending on the order of processing the symbols. 

In order to discuss the manner employed by the syntax checker to 
accomplish its assigned tasks, consider the following example of a 
language and a method to parse such a Inaguage : 

1 . Grammar 

Given the following grammar; 

<Z>::= J_<a>J_ 

< a > : := < b > | 

< a > H- < b > 

< b > : := < c > I 

< b > - < c > 



< c > : := T 

( < a > ) 

Non-terminal symbols: Z, a, b, c. 

Terminal symbols: T, J_, +, -, (, ). 

2 . Derivations 

To discuss ambiguity it is necessary first to discuss 
derivations. Observe that the following string has more than one 
derivation. Thus, <c>+<b>-<c> can be derived by two sets 
of productions : 
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<Z> ::=J_<a>J_ ::=<a> + <b> ::=<b>+<b> 

::= <c> + <b>::=<c>H-<b> - <c>and, 

<Z> ::=J_<a>J_::=<a> + <b> ::=<a> + <b>-<c> 
<b> +<b> - <c>::=<c> + <b>-<c> 



3 . Syntax Trees 

The construction of the syntax tree is now shown; 




< b > 

I 

< c > 




< b > - < c > 

< c > 



T 



T 



T 



Although there may be more than one set of productions 
which yields the same string of symbols, their syntax-trees are the 
same. Ambiguity exists only when one or more strings have more than 
one syntax tree. 

Observe that the syntax tree shown below would result if the 
string <c> + <b>-<c>, not a member of L(G), is parsed by 
proceeding left at every junction down the tree (applying the left- 
most derivation). 

< Z > 

1 < I > i 

< b > 

I 

< c > 



T 
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4 . Procedure for a top-down left-right parser 



To acquire familiarity with the top-down left-right parser, 
consider the following procedure. The top-down left-right parser makes 
an initial assumption that the string presented is a valid final 
sentential form, and then proceeds as follows: 



step 1. 


Establish goal symbol, in this case Z. 


step 2. 


Apply a production to the goal or subgoal. 


step 3. 


First symbol of resultant string a terminal? 
Yes, continue; No, go to step 6. 


step 4. 


Terminal symbol and input string match? 
Yes, continue; No, go to step 7. 


step 5. 


More symbols in string of production? 
Yes, continue; No, done. 


step 6. 


Establish subgoal and go to step 2. 


step 7. 


An alternate, (OR), production? 
Yes, go to step 6; No, done. 



A slightly modified form of the above procedure will recognize 
grammars with left, right, and self-embedded recursion. In addition, 
ambiguous grammars are processed such that when two or more syntax 
trees are available, the first, as determined by its placement in the 
B.N.F,, will be recognized. 

This simplified procedure has been provided to acquaint the 
reader with the general procedure of top-down left-right parsing. 
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III. SUPPORT ROUTINES 



A. DISCUSSION 

The previous chapters presented the schema of the universal syntax 
checker. Included was an explanation of the input requirements of the 
system as well as a simplified method to accomplish the syntax checking 
task . 

Before a detailed discussion of the actual parsing algorithm can 
begin, it is necessary to describe certain grammar manipulations and 
tables required to support the algorithm. 

B. TABLES 

The manner in which the grammar is placed into the supporting 
tables is now presented. Note that the identifiers enclosed in the 
parentheses immediately following the table titles, are the actual 
identifiers which were used in the coding of the universal syntax 
checker program. 

1. B.N.F. Table (P) 

This table contains the definitions of the B.N.F. in the form 
< identifier > : := < letter > 1 < identifier > < letter >, and appears 
in the table as shown in Figure 3-1, where i is the number of symbols 
in the production and j is the number of productions describing the 
language. The example is taken from the first grammar listed in the 
computer program listing. 
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p 



1 


1 

IDENTIFIER 


2 1 

J 

LETTER 


|3 


1 ! 


r 1 

i 


2 




IDENTIFIER 


LETTER 


i ■ 

! 


' 3 


LETTER 


A 








4 

L. 




B 








5 




C 








. 




• 




. i 


L.„ 


j 








! 

. _j 


i 

1 



Figure 3-1. The B,N,F, table. 



2. Symbol Table (SYMTAB) 

The symbol table is constructed from the grammar placed in 
the B.N.F, table. This symbol table contains all the elements of V 
followed by all the elements of . The first entry in the table is 
the empty string. 

Since no terminal symbol may appear as the left part of a pro- 
duction, the non-terminal symbols were located by examining the 
first column of the B.N.F. table and applying the following logic: 

step 1. Is the symbol being examined listed in the 

symbol table? 

Yes, examine next symbol; No, go to step 2. 
step 2. Put the symbol in the symbol table, increment 

the table pointer, and examine the next symbol. 

After noting the location of the last non-terminal symbol, the 

terminal symbols are placed into the table. Since no terminal 

symbol may appear on the left of a production, these symbols are 
determined using the logic as before, excluding the first column 
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and examining the remaining columns of the B.N.F. table. Referring to 
the grammar shown in Figure 3-1, Figure 3-2 shows the symbol table upon 
completion of its construction. The identifier, nsymbols , refers to 
the total number of symbols in the grammar. 

3. Onright Table (ONRIGHT) 

There is a one-to-one correspondence between the subscript 
numbers of the symbol table and the subscript numbers of the onright 
table . 

SYMTAB 



■| 



1 

1 


IDENTIFIER 


1 2 


LETTER 


3 




• 




40 


last non-terminal 


41 ‘A 


42 


B 


! ^3 


C 


. 


nsymbols 

. 1 


(last symbol) 



Figure 3-2. The Symbol table. 

Thus, ONRIGHT (i) is marked "true*' if and only if the symbol in 
SYMTAB (i) appears in the r^ight part of some production. A search 
of the "false" entries in the onright table produces the goal symbol. 
More than one "false" entry indicates that the goal symbol is not 
unique and therefore the grammar is unacceptable. Figure 3-3 depicts 
the onright table for an acceptable grammar. 
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ONRIGHT 



1 


True 


2 


Fa Ise 


3 


True 


4 


True 


• 


n 

M 

M 


nsymbo Is 


True 



Figure 3-3. The Onright table. 

4. Production Table (PR) 

There is a one-to-one correspondence between the production 
table (PR) and B.N.F. table (P) . Also, there exists an onto mapping 
from the entries in the production table to the subscript numbers of 
the symbol table. Thus, wherever the symbol in SYMTAB (i) appears 
in the B.N.F. table, the entry **i” is made in the production table. 
For example, consider the symbol table in Figure 3-2. Wherever the 
symbol IDENTIFIER appears in the B.N.F. table shown in Figure 3-1, 
an entry "2" is made in the production table as shown in Figure 3-4. 
Note the recursive production occurring in the second row. 
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Figure 3-4. The Production table. 
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IV. THE PARSING ALGORITHM 



A. BACKGROUND 

The inputs, general parsing procedure, and support tables of the 
universal syntax checker have been described. A presentation of the 
actual programming implementation remains. This implementation was 
accomplished utilizing the recursive procedures available in the PL/ 1 
programming language. The operation of the major procedure, called 
RECOGNIZE, is dependent upon a symbol buffer and accumulator. The 
buffer contains a portion of the input string. The accumulator 
provides symbol back-up and is the heart of the procedure NEXTSYM. 
These two procedures will be discussed briefly before a description 
of the algorithm is given in reference ALGOL [16]. 

B. PROCEDURES 

1 . Nextsym 

The NEXTSYM procedure centers around the accumulator shown 
in Figure 4-1. Each element of the accumulator contains an entry 
”i” corresponding to SYMTAB (i) for each symbol recognized in the 
buffer . 

Associated with the accumulator are three pointers : ap, tap, 

and acclen. The accumulator pointer, ap, points to the symbol being 
examined. The temporary accumulator pointer, tap, retains the value 
of the accumulator pointer at each junction in the syntax tree that 
the procedure examines. The accumulator length pointer, acclen, 
retains the total number of accumulator positions filled. 
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Accumulator 



1 2 3 4 5 6 7 ... 47 48 49 50 



35 


60 


42 


43 




1 


1 







ap 

acc len 



Figure 4-1. Accumulator 

The accumulator will hold the first fifty symbols from the input 
string. Thereafter, it will contain the most recent forty to fifty 
symbols. The variable **offset" serves to adjust the subscript number 
i of the accumulator while allowing the accumulator pointer to increase 
sequent ia lly . 

2 , Recognize 

The RECOGNIZE procedure is a top-down left-right slow-back 
parser. The method, as described in Chapter II, has been implemented 
with one exception: the parser attempts to recognize the left-most 

derivation first. The B.N.F, presented to the universal syntax checker 
must be modified so that the left-most derivation is the longest string 
possible (see Appendix A). 

C. THE SYNTAX CHECKING ALGORITHM 

The description of the syntax checking algorithm follows (refer- 
ence ALGOL). The AI/IOL version of the algorithm has not been run on 
a computer, although the PL/ 1 version presented in the computer pro- 
gram section has been tested, and the results are in the computer 
output section. 
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procedure MAIN ; 



integer bp, ap, acclen, offset, i, j, k, npr, nonterm, 

nsymbols , bl, linecount , target, scan, a; 

array Symtab [ 1 rnsymbols ] , Buffer [1:80], Accum [0:50], 

PR [ l:n, 1:8] ; 

boolean array Onright [ 1 :nsymbols ] ; 
boolean change, empty, done, check; 
comment This procedure is a top-down left-right slow-back parser. 
To recognize a final sentential form of the B.N.F. presented to 
the procedure, the distinguished symbol is established as the 
goal symbol, and productions are applied, where valid, in the 
order presented. Once a terminal symbol is encountered, the 
symbol is matched with the next symbol in the input string. 

If the match is successful, an attempt is made to match the 
next symbol of the string, and so on. If a match is not 
found, then an alternate production is attempted in an effort 
to recognize the symbol in the buffer. On completion of the 
procedure, the input string will be declared syntactically 
correct or incorrect ; 
procedure LOOKUP (k) ; 

value k ; integer k ; 
begin for j := 1 until npr ^ 
if PR(j,l) = k then 
LOOKUP := j ; 
go to EXIT end ; 

EXIT : ^ LOOKUP ; 
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boolean procedure NEXTSYM (k) ; 



va lue k ; integer k ; 

begin comment This procedure controls the accumulator and checks 
for recognition of the symbols in the buffer with the results 
of the application of production rules; 
integer i , m ; 
i := (ap + 1) - offset; 
if i > acclen then begin 
for i := i whi le (true) 
if i = 50 then begin 

for j := acclen + 1 step 1 unt i 1 i d_o begin 
accum [j] := SCAN; 

if check then write ( * symtab [accum[j ] ]) ; end » 
acc len := i ; 

if accum [i] = k then begin 
ap := ap + 1 ; 

NEXTSYM := true; end 
else NEXTSYM := false; 
go to A ; end 

e Ise begin offset := offset + 10; 
begin for m := 1 until 40 d_o 

accum [m] := accum [m + 10]; end ; 
i := i - 10 ; end 
e Ise if i < 1 then begin 



write (’ depth of search exceeded*); 



i f accum [i] = k then begin 
ap := ap + 1 ; 

NEXTSYM := true; end 
else NEXTSYM := false; 
end ; 

A: end NEXTSYM: 

boolean procedure RECOGNIZE (production, element); 
va lue production, element; 
integer production, element; 

begin comment This procedure attempts to recognize each of the 
symbols in the input string. If a symbol is recognized then 
RECOGNIZE is set to true, otherwise false; 
integer k, Ir, tap; 

Ir := 0; 

if element ~ 8 then tap := ap; 
i f element = 1 then begin 

i f PR [production, 1] = PR [production, 2] then 

if 1 RECOGNIZE (production, 3) then begin 

ap := tap; 

RECOGNIZE := ^ PR [production, 1] = PR [production + 1, 1] 
then RECOGNIZE (production +1, 1) 
else false ; 

go to OUT ; 
end 

e Ise RECOGNIZE := true 

e Ise i f RECOGNIZE (production, 2) then begin 
tap := ap ; 
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if check then begin 



write (symbol found) ; 

for production := production while (PR [production, 1] 
PR [production -h 1, 1]/\PR [production *f 1, 1] ^ 
PR [production + 1, 2]) 
production := production + 1 
e Ise end ; 

if Ir 0 then begin 

write (symbol, ’number of left recursive 
production found ’ ) ; 
ap := tap; 

RECOGNIZE := true; 
end 

else RECOGNIZE ;= false 
go to OUT 
else begin 
tap := ap ; 

RECOGNIZE := PR [production, 1] = PR [production 
+ 1, 1] /\PR [production +1, 1] 

PR [production + 1, 2] then 
RECOGNIZE (production +1,1) 
e Ise false ; 

go to OUT ; 
end 

else begin 

k := PR [production, element]; 

RECOGNIZE := if k 1 then 
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if k > nonterm then 



RECOGNIZE := ^ NEXTSYM (k) then 

RECOGNIZE (production, 
e lement + 1) 
else false 



e Ise 



RECOGNIZE := RECOGNIZE (LOOKUP (k) , 1) 
then RECOGNIZE (production, 
element + 1) 
else false 



e Ise true ; 

end 

e Ise RECOGNIZE := true; 

OUT: 



end RECOGNIZE; 
if idone then 

begin if RECOGNIZE (1, 1) then write ( 'syntax ok' ) 
else write ( 'syntax error' ); k := 0; 
for k := k whi le (k ^ nsymbols /\ — idone) ^ k := scan; 
bp := 72; ap := acclen := offset := linecount := 0; 
end MAIN ; 
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V. CONCLUSIONS 



A. USES 



The algorithm presented has many applications. However, one 
application, more suitable than the others, is use in a time-sharing 
e nvironment . 

In a time-sharing system, the universal syntax checker could reside 
on a direct access storage device, along with the B.N.F. definitions, 
to be called when desired. Each terminal user would be able to time- 
share this syntax checker regardless of the language used. The syntax 
checker would provide a very rapid syntax check for each user at each 
terminal. As soon as the first syntax error was encountered, syntax 
checking of that program would end. The user at an on-line terminal 
would then examine the incorrect statement and effect a correction. 

The syntax check would restart the program analysis. Figure 5-1 
depicts a possible configuration for a time sharing system with 
syntax checker. 

TERMINALS DASD 
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Figure 5-1. Time-sharing system with syntax checker. 
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B. FURTHER RESEARCH AND IMPROVEMENTS 



One obvious need is the actual implementation of the universal 
syntax checker in a time-sharing system. 

Two major improvements are also needed. First, a recovery pro- 
cedure which will allow recovery to a logical restart point, to continue 
syntax checking, after encountering a syntax error. Second, there is 
a need for a more complete and precise set of diagnostic messages. 
Simply to say "syntax ok" or "syntax error" is not enough. Precise 
statements of the form where, what, and why are needed. 
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APPENDIX A 



Backus -normal form requirements. 

1. Maximum number of characters per symbol is ten. 

2. Maximum number of symbols per production is eight. 

3. Maximum number of productions per B.N.F. is three hundred. 

4. Left recursive productions must include the trivial production 
prior to the recursive production. 

5. Cyclic productions require the trivial productions as in 4. 

6. Ttie B.N.F. must be ordered to present the longest string possible 
prior to any other production, 

7. The character string $PROGRAM is not allowed in a production for 
a language. This string is used as an indicator to make the end 
of ttie B,N,F. and the start of each program submitted for parsing, 

8. Place the characters $PROGRAM immediately after the B.N.F, sub- 
mitted and immediately after each program submitted for syntax 
checking , 
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